Search CORE

5 research outputs found

Short Text Categorization using World Knowledge

Author: Türker Rima
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 26/08/2021
Field of study

The content of the World Wide Web is drastically multiplying, and thus the amount of available online text data is increasing every day. Today, many users contribute to this massive global network via online platforms by sharing information in the form of a short text. Such an immense amount of data covers subjects from all the existing domains (e.g., Sports, Economy, Biology, etc.). Further, manually processing such data is beyond human capabilities. As a result, Natural Language Processing (NLP) tasks, which aim to automatically analyze and process natural language documents have gained significant attention. Among these tasks, due to its application in various domains, text categorization has become one of the most fundamental and crucial tasks. However, the standard text categorization models face major challenges while performing short text categorization, due to the unique characteristics of short texts, i.e., insufficient text length, sparsity, ambiguity, etc. In other words, the conventional approaches provide substandard performance, when they are directly applied to the short text categorization task. Furthermore, in the case of short text, the standard feature extraction techniques such as bag-of-words suffer from limited contextual information. Hence, it is essential to enhance the text representations with an external knowledge source. Moreover, the traditional models require a significant amount of manually labeled data and obtaining labeled data is a costly and time-consuming task. Therefore, although recently proposed supervised methods, especially, deep neural network approaches have demonstrated notable performance, the requirement of the labeled data remains the main bottleneck of these approaches. In this thesis, we investigate the main research question of how to perform \textit{short text categorization} effectively \textit{without requiring any labeled data} using knowledge bases as an external source. In this regard, novel short text categorization models, namely, Knowledge-Based Short Text Categorization (KBSTC) and Weakly Supervised Short Text Categorization using World Knowledge (WESSTEC) have been introduced and evaluated in this thesis. The models do not require any hand-labeled data to perform short text categorization, instead, they leverage the semantic similarity between the short texts and the predefined categories. To quantify such semantic similarity, the low dimensional representation of entities and categories have been learned by exploiting a large knowledge base. To achieve that a novel entity and category embedding model has also been proposed in this thesis. The extensive experiments have been conducted to assess the performance of the proposed short text categorization models and the embedding model on several standard benchmark datasets

KITopen

Entity-Based Short Text Classification Using Convolutional Neural Networks

Author: Alam Mehwish
Bie Qingyuan
Sack Harald
Türker Rima
Publication venue: Springer Verlag
Publication date: 01/01/2020
Field of study

KITopen

"The Less Is More" for Text Classification

Author: Koutraki Maria
Sack Harald
Türker Rima
Zhang Lei
Publication venue: RWTH Aachen
Publication date: 01/01/2018
Field of study

KITopen

TECNE: Knowledge based text classification using network embeddings

Author: Koutraki Maria
Sack Harald
Türker Rima
Zhang Lei
Publication venue: RWTH Aachen
Publication date: 01/01/2018
Field of study

Text classification is an important and challenging task due to its application in various domains such as document organization and news filtering. Several supervised learning approaches have been proposed for text classification. However, most of them require a significant amount of training data. Manually labeling such data can be very time-consuming and costly. To overcome the problem of labeled data, we demonstrate TECNE, a knowledge-based text classification method using network embeddings. The proposed system does not require any labeled training data to classify an arbitrary text. Instead, it relies on the semantic similarity between entities appearing in a given text and a set of predefined categories to determine a category which the given document belongs to

KITopen

ESWC 2023 Workshops and Tutorials Joint Proceedings: Joint Proceedings of the ESWC 2023 Workshops and Tutorials co-located with the 20th European Semantic Web Conference (ESWC 2023)

Author: Aebeloe Christian
Alam Mehwish
Aras Hidir
Azzam Amr
Cano Juan
Domingue John
Gottschalk Simon
Hartig Olaf
Hertling Sven
Hose Katja
Kirrane Sabrina
Lisena Pasquale
Osborne Francesco
Pesquita Catia
Rohde Philipp
Steels Luc
Taelman Ruben
Third Aisling
Tiddi Ilaria
Trojahn Cassia
Türker Rima
Publication venue: HAL CCSD
Publication date: 19/08/2023
Field of study

Joint Proceedings of the ESWC 2023 Workshops and Tutorials co-located with the 20th European Semantic Web Conference (ESWC 2023

Scientific Publications of the University of Toulouse II Le Mirail